第2章 从qplot开始入门

数据集

diamonds数据集,该数据集反应钻石的之类的四个“C”(克拉重量(carat)、切工(cut)、颜色(color)和净度(clarity)),以及五个物理指标(深度(depth)、钻面宽度(table)、x、y、z)

##       carat       cut color clarity depth table price    x    y    z
## 16011  1.33   Premium     H     SI2  62.8  52.0  6405 7.15 7.06 4.46
## 42666  0.56     Ideal     F     SI1  61.5  55.0  1334 5.30 5.34 3.27
## 22048  0.31     Ideal     I     VS1  62.8  57.0   628 4.32 4.28 2.70
## 53655  0.76     Ideal     D     SI2  62.2  57.0  2706 5.85 5.83 3.63
## 52046  0.55 Very Good     G    VVS1  61.5  55.1  2451 5.26 5.28 3.24
## 28559  0.30     Ideal     H     VS1  62.1  54.0   675 4.35 4.32 2.69
## 26581  1.66 Very Good     F     VS2  61.4  58.0 16294 7.63 7.68 4.70
## 20715  0.31     Ideal     F     VS2  61.9  54.0   625 4.35 4.38 2.70
## 44985  0.31     Ideal     G     SI2  61.7  56.0   523 4.38 4.34 2.69
## 8184   1.02     Ideal     G     SI2  62.0  55.0  4366 6.40 6.51 4.00

基本用法

绘制一张展现钻石价格与重量之间的关系的散点图。

qplot(carat,price,data=diamonds)

将变量的函数(log)作为参数

qplot(log(carat),log(price),data = diamonds)

钻石的体积和其质量直接的关系

qplot(carat,x*y*z,data = diamonds)

向重量和价格的散点图添加颜色和切工的信息

qplot(carat,price,data = dsmall,colour = color)

qplot(carat,price,data = dsmall,shape = cut)

使用alpha图像属性,其取值从0(完全透明)变动到1(完全不透明)

qplot(carat,price,data = diamonds,alpha = I(1/10))

qplot(carat,price,data = diamonds,alpha = I(1/100))

qplot(carat,price,data = diamonds,alpha = I(1/200))

几何对象 (geom)

几何对象描述了应该用何种对象来对数据进行展示,其中有些几何对象关联了对应的统计变换。它几乎可以画出任何一种类型的图形。

二维变量

  • geom = “point” 绘制散点图。
  • geom = “smoooth” 将拟合一条平滑曲线
  • geom = “boxplot” 绘制箱线胡须图
  • geom = “path” 和 geom = “line” 在数据点之间绘制连线。

一维连续变量

  • geom = “histogram” 绘制直方图
  • geom=“freqpoly” 绘制频率多边形
  • geom = “desity” 绘制密度曲线

一维离散变量

  • geom = “bar” 绘制条形图

向图中添加平滑曲线

dsmall1 <- diamonds[sample(nrow(diamonds),100),]
qplot(carat,price,data = dsmall1,geom = c("point","smooth"))
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

qplot(carat,price,data = diamonds,geom = c("point","smooth"))
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

  • 利用method参数选择不同的平滑器:

method = “loess”,当n较小事是默认选项,使用的是局部回归的方法。关于这一算法的更多细节可以查阅帮助?loess。曲线的平滑程度是由span参数控制的,其取值范围是从0(很不平滑)到1(很平滑)。

qplot(carat,price,data = dsmall1,geom = c("point","smooth"),span =0.2)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

qplot(carat,price,data = dsmall1,geom = c("point","smooth"),span=1)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Loess对于大数据并不十分使用(内存的消耗是O(n^2)),因此当n超过1000时将默认采用另一种平滑算法。

箱线图和扰动点图

qplot(color,price/carat,data = diamonds,geom = "jitter",alpha = I(1/5))

qplot(color,price/carat,data = diamonds,geom = "jitter",alpha = I(1/50))

qplot(color,price/carat,data = diamonds,geom = "jitter",alpha = I(1/200))

直方图和密度曲线图

qplot(carat,data = diamonds,geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(carat,data = diamonds,geom = "density")

对于直方图,binwidth参数设定组距,从而调节平滑度

qplot(carat,data = diamonds,geom = "histogram",binwidth = 1,xlim = c(0,3))

qplot(carat,data = diamonds,geom = "histogram",binwidth = 0.1,xlim = c(0,3))

qplot(carat,data = diamonds,geom = "histogram",binwidth = 0.01,xlim = c(0,3))
## Warning: position_stack requires constant width: output may be incorrect

**当一个分类被映射到某个图形属性上,几何对象会自动按这个变量进行拆分,因此,下述命令会告诉qplot()对每一种钻石颜色都绘制一次密度曲线和直方图。

qplot(carat,data = diamonds,geom = "density",colour = color)

qplot(carat,data = diamonds,geom = "histogram",fill = color)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

条形图

qplot(color,data=diamonds,geom = "bar")

qplot(color,data = diamonds,geom = "bar",weight = carat) + scale_y_continuous("carat")

时间序列中的线条图

线条图将点从左到右进行连接,而路径图则按照点在数据集中的顺序对其进行连接(线条图就等价于将数据按照X取值进行排序,然后绘制路径图)。线条图的X轴一般是时间,它展示了单个变量随时间变换的情况。路径图则展示了两个变量随时间联动的情况,时间反映在点的顺序上。

qplot(date,unemploy/pop,data = economics,geom = "line")

qplot(date,uempmed,data = economics,geom = "line")

分面

qplot(carat,data = diamonds,facets = color ~ .,geom = "histogram",binwidth = 0.1,xlim = c(0,3))

qplot(carat,..density..,data = diamonds,facets = color ~.,geom = "histogram",binwidth = 0.1,xlim = c(0,3))